73 research outputs found

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Teoria del Senyal i Comunicacion

    Data and methods for a visual understanding of sign languages

    Get PDF
    Signed languages are complete and natural languages used as the first or preferred mode of communication by millions of people worldwide. However, they, unfortunately, continue to be marginalized languages. Designing, building, and evaluating models that work on sign languages presents compelling research challenges and requires interdisciplinary and collaborative efforts. The recent advances in Machine Learning (ML) and Artificial Intelligence (AI) has the power to enable better accessibility to sign language users and narrow down the existing communication barrier between the Deaf community and non-sign language users. However, recent AI-powered technologies still do not account for sign language in their pipelines. This is mainly because sign languages are visual languages, that use manual and non-manual features to convey information, and do not have a standard written form. Thus, the goal of this thesis is to contribute to the development of new technologies that account for sign language by creating large-scale multimodal resources suitable for training modern data-hungry machine learning models and developing automatic systems that focus on computer vision tasks related to sign language that aims at learning better visual understanding of sign languages. Thus, in Part I, we introduce the How2Sign dataset, which is a large-scale collection of multimodal and multiview sign language videos in American Sign Language. In Part II, we contribute to the development of technologies that account for sign languages by presenting in Chapter 4 a framework called Spot-Align, based on sign spotting methods, to automatically annotate sign instances in continuous sign language. We further present the benefits of this framework and establish a baseline for the sign language recognition task on the How2Sign dataset. In addition to that, in Chapter 5 we benefit from the different annotations and modalities of the How2Sign to explore sign language video retrieval by learning cross-modal embeddings. Later in Chapter 6, we explore sign language video generation by applying Generative Adversarial Networks to the sign language domain and assess if and how well sign language users can understand automatically generated sign language videos by proposing an evaluation protocol based on How2Sign topics and English translationLes llengües de signes són llengües completes i naturals que utilitzen milions de persones de tot el món com mode de comunicació primer o preferit. Tanmateix, malauradament, continuen essent llengües marginades. Dissenyar, construir i avaluar tecnologies que funcionin amb les llengües de signes presenta reptes de recerca que requereixen d’esforços interdisciplinaris i col·laboratius. Els avenços recents en l’aprenentatge automàtic i la intel·ligència artificial (IA) poden millorar l’accessibilitat tecnològica dels signants, i alhora reduir la barrera de comunicació existent entre la comunitat sorda i les persones no-signants. Tanmateix, les tecnologies més modernes en IA encara no consideren les llengües de signes en les seves interfícies amb l’usuari. Això es deu principalment a que les llengües de signes són llenguatges visuals, que utilitzen característiques manuals i no manuals per transmetre informació, i no tenen una forma escrita estàndard. Els objectius principals d’aquesta tesi són la creació de recursos multimodals a gran escala adequats per entrenar models d’aprenentatge automàtic per a llengües de signes, i desenvolupar sistemes de visió per computador adreçats a una millor comprensió automàtica de les llengües de signes. Així, a la Part I presentem la base de dades How2Sign, una gran col·lecció multimodal i multivista de vídeos de la llengua de signes nord-americana. A la Part II, contribuïm al desenvolupament de tecnologia per a llengües de signes, presentant al capítol 4 una solució per anotar signes automàticament anomenada Spot-Align, basada en mètodes de localització de signes en seqüències contínues de signes. Després, presentem els avantatges d’aquesta solució i proporcionem uns primers resultats per la tasca de reconeixement de la llengua de signes a la base de dades How2Sign. A continuació, al capítol 5 aprofitem de les anotacions i diverses modalitats de How2Sign per explorar la cerca de vídeos en llengua de signes a partir de l’entrenament d’incrustacions multimodals. Finalment, al capítol 6, explorem la generació de vídeos en llengua de signes aplicant xarxes adversàries generatives al domini de la llengua de signes. Avaluem fins a quin punt els signants poden entendre els vídeos generats automàticament, proposant un nou protocol d’avaluació basat en les categories dins de How2Sign i la traducció dels vídeos a l’anglès escritLas lenguas de signos son lenguas completas y naturales que utilizan millones de personas de todo el mundo como modo de comunicación primero o preferido. Sin embargo, desgraciadamente, siguen siendo lenguas marginadas. Diseñar, construir y evaluar tecnologías que funcionen con las lenguas de signos presenta retos de investigación que requieren esfuerzos interdisciplinares y colaborativos. Los avances recientes en el aprendizaje automático y la inteligencia artificial (IA) pueden mejorar la accesibilidad tecnológica de los signantes, al tiempo que reducir la barrera de comunicación existente entre la comunidad sorda y las personas no signantes. Sin embargo, las tecnologías más modernas en IA todavía no consideran las lenguas de signos en sus interfaces con el usuario. Esto se debe principalmente a que las lenguas de signos son lenguajes visuales, que utilizan características manuales y no manuales para transmitir información, y carecen de una forma escrita estándar. Los principales objetivos de esta tesis son la creación de recursos multimodales a gran escala adecuados para entrenar modelos de aprendizaje automático para lenguas de signos, y desarrollar sistemas de visión por computador dirigidos a una mejor comprensión automática de las lenguas de signos. Así, en la Parte I presentamos la base de datos How2Sign, una gran colección multimodal y multivista de vídeos de lenguaje la lengua de signos estadounidense. En la Part II, contribuimos al desarrollo de tecnología para lenguas de signos, presentando en el capítulo 4 una solución para anotar signos automáticamente llamada Spot-Align, basada en métodos de localización de signos en secuencias continuas de signos. Después, presentamos las ventajas de esta solución y proporcionamos unos primeros resultados por la tarea de reconocimiento de la lengua de signos en la base de datos How2Sign. A continuación, en el capítulo 5 aprovechamos de las anotaciones y diversas modalidades de How2Sign para explorar la búsqueda de vídeos en lengua de signos a partir del entrenamiento de incrustaciones multimodales. Finalmente, en el capítulo 6, exploramos la generación de vídeos en lengua de signos aplicando redes adversarias generativas al dominio de la lengua de signos. Evaluamos hasta qué punto los signantes pueden entender los vídeos generados automáticamente, proponiendo un nuevo protocolo de evaluación basado en las categorías dentro de How2Sign y la traducción de los vídeos al inglés escrito.Postprint (published version

    Sign language video retrieval with free-form textual queries

    Get PDF
    Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with textual queries: given a written query (e.g. a sentence) and a large collection of sign language videos, the objective is to find the signing video that best matches the written query. We propose to tackle this task by learning cross-modal embeddings on the recently introduced large-scale How2Sign dataset of American Sign Language (ASL). We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labelled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. We validate the effectiveness of SPOT-ALIGN for learning a robust sign video embedding through improvements in both sign recognition and the proposed video retrieval task.This work was supported by the project PID2020-117142GB-I00, funded by MCIN/ AEI /10.13039/501100011033, ANR project CorVis ANR-21-CE23-0003- 01, and gifts from Google and Adobe. AD received support from la Caixa Foundation (ID 100010434), fellowship code LCF/BQ/IN18/11660029.Peer ReviewedObjectius de Desenvolupament Sostenible::10 - Reducció de les DesigualtatsObjectius de Desenvolupament Sostenible::10 - Reducció de les Desigualtats::10.2 - Per a 2030, potenciar i promoure la inclusió social, econòmica i política de totes les persones, independentment de l’edat, sexe, discapacitat, raça, ètnia, origen, religió, situació econòmica o altra condicióPostprint (author's final draft

    Sign language translation from instructional videos

    Get PDF
    The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.This research was partially supported by research grant Adavoice PID2019-107579RB-I00 / AEI / 10.13039/501100011033, research grants PRE2020-094223, PID2021-126248OB-I00 and PID2019-107255GB-C21 and by Generalitat de Catalunya (AGAUR) under grant agreement 2021-SGR-00478.Peer ReviewedPostprint (published version

    How2Sign: A large-scale multimodal dataset for continuous American sign language

    Get PDF
    One of the factors that have hindered progress in the areas of sign language recognition, translation, and production is the absence of large annotated datasets. Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation. To evaluate the potential of How2Sign for real-world impact, we conduct a study with ASL signers and show that synthesized videos using our dataset can indeed be understood. The study further gives insights on challenges that computer vision should address in order to make progress in this field. Dataset website: http://how2sign.github.io/This work received funding from Facebook through gifts to CMU and UPC; through projects TEC2016-75976-R, TIN2015- 65316-P, SEV-2015-0493 and PID2019-107255GB-C22 of the Spanish Government and 2017-SGR-1414 of Generalitat de Catalunya. This work used XSEDE’s “Bridges” system at the Pittsburgh Supercomputing Center (NSF award ACI- 1445606). Amanda Duarte has received support from la Caixa Foundation (ID 100010434) under the fellowship code LCF/BQ/IN18/11660029. Shruti Palaskar was supported by the Facebook Fellowship program.Peer ReviewedObjectius de Desenvolupament Sostenible::10 - Reducció de les DesigualtatsObjectius de Desenvolupament Sostenible::4 - Educació de Qualitat::4.5 - Per a 2030, eliminar les disparitats de gènere en l’educació i garantir l’accés en condicions d’igualtat a les persones vulnerables, incloses les persones amb discapacitat, els pobles indígenes i els nens i nenes en situacions de vulnerabilitat, a tots els nivells de l’ensenyament i la formació professionalObjectius de Desenvolupament Sostenible::10 - Reducció de les Desigualtats::10.2 - Per a 2030, potenciar i promoure la inclusió social, econòmica i política de totes les persones, independentment de l’edat, sexe, discapacitat, raça, ètnia, origen, religió, situació econòmica o altra condicióObjectius de Desenvolupament Sostenible::4 - Educació de QualitatPostprint (author's final draft

    Traditional Drugs: Mechanisms of Immunosuppressor and Corticosteroid Therapies for Inflammatory Bowel Diseases

    Get PDF
    The inflammatory bowel diseases (IBD) such as Crohn’s disease and ulcerative colitis are immunological dysfunctions of the gastrointestinal tract that develop because of multifactorial processes, including genetic predisposition, gut dysbiosis, and excessive inflammation in susceptible subjects. These pathologies affect millions of people worldwide, with substantial impact on healthcare systems and patients’ quality of life. Considering the chronic inflammation that underlies the IBD presentation, the main treatment options are related to the control of patients’ inflammatory response, through immunosuppressor and modulatory therapies. Therefore, in this chapter we reviewed the main mechanisms associated with the treatments that are aimed at suppressing mucosal immunity and the effects of corticosteroid therapies in Crohn’s disease and ulcerative colitis

    Association between nutritional status, ostomy time and quality of life in patients with colorectal cancer

    Get PDF
    Background: Ostomy may be necessary for the patient who performs bowel resection, but it could influence the nutritional status and quality of life (QoL). The aim of this study was to evaluate the influence of ostomy time and nutritional status on QoL. Methods: Cross-sectional was performed with 66 patients ostomized by colorectal cancer in a reference service. Socioeconomic, demographic, anthropometric QoL were obtained. Other clinical and surgical data were registered from the clinical records. The anthropometric data were weight and height, with these data the Body Mass Index (BMI) was analyzed. To evaluate the QoL, the European Organization for Research and Treatment of Cancer questionnaire EORTCQLQ30 and EORTC-QLQ-CR29 were used. Statistical significance analysis was performed using the analysis of variance or chi-square test. Results: Of 66 individuals, 51,5% were male, 75,8% had 55 years of age or older, 56.3% have ostomy for less than 1 year. Over half of the patients had some nutritional status inadequacy: 23.4% were underweight, 20.3% overweight and 9.45% obese. The higher ostomy time and the malnutrition influence the QoL in patients with colorectal cancer. The under ostomy time was associated with difficult financial domain (p=0.045) and the higher ostomy time with urinary incontinence (p=0.046) while the malnutrition was associated with sleep disturbance (p=0.019), abdominal pain domains (p value = 0.028), bloating (p=0.011), concern about weight (p=0.002) and female sexual interest (p value = 0.038). Conclusions: The current study revealed that the ostomy time and nutritional status influence in the QoL in patients with colorectal cancer in postoperative ostomy

    Survival and predictors of death in tuberculosis/HIV coinfection cases in Porto Alegre, Brazil : a historical cohort from 2009 to 2013

    Get PDF
    Background: Tuberculosis is a curable disease, which remains the leading cause of death among infectious diseases worldwide, and it is the leading cause of death in people living with HIV. The purpose is to examine survival and predictors of death in Tuberculosis/HIV coinfection cases from 2009 to 2013. Methods: We estimated the survival of 2,417 TB/HIV coinfection cases in Porto Alegre, from diagnosis up to 85 months of follow-up. We estimated hazard ratios and survival curves. Results: The adjusted risk ratio (aRR) for death, by age, hospitalization, and Directly Observed Treatment was 4.58 for new cases (95% CI: 1.14–18.4), 4.51 for recurrence (95% CI: 1.11–18.4) and 4.53 for return after abandonment (95% CI: 1.12–18.4). The average survival time was 72.56 ± 1.57 months for those who underwent Directly Observed Treatment and 62.61 ± 0.77 for those who did not. Conclusions: Case classification, age, and hospitalization are predictors of death. The occurrence of Directly Observed Treatment was a protective factor that increased the probability of survival. Policies aimed at reducing the mortality of patients with TB/HIV coinfection are needed

    Epidemiological aspects and oral implications of Paracoccidioidomycosis infection: an integrative review

    Get PDF
    Paracoccidioidomycosis (PCM) is a systemic mycosis caused by the dimorphic fungus Paracoccidioides brasiliensis. It represents a significant infection in South America, occurring mainly in tropical and subtropical countries such as Brazil. Oral mucosal lesions, which are the most important symptom in dentistry, may be the first visible physical manifestation of the disease, often preceding even pulmonary lesions. This study aims to carry out an integrative literature review to identify the main epidemiological aspects and oral implications of Paracoccidioidomycosis (PMC) infection. A search was carried out in the PubMed, LILACS and SciELO databases and, after applying the inclusion, exclusion, eligibility and thematic relevance criteria, 18 articles were selected for analysis in this study. Paracoccidioidomycosis (PCM) is endemic in Brazil and mainly affects middle-aged and elderly men and occurs in rural areas. It is a systemic disease where clinical manifestations are often added to oral lesions. Dentists play a key role in identifying these lesions, in the correct diagnosis and treatment of this disease. Making PCM notification compulsory in Brazil is essential.Paracoccidioidomycosis (PCM) is a systemic mycosis caused by the dimorphic fungus Paracoccidioides brasiliensis. It represents a significant infection in South America, occurring mainly in tropical and subtropical countries such as Brazil. Oral mucosal lesions, which are the most important symptom in dentistry, may be the first visible physical manifestation of the disease, often preceding even pulmonary lesions. This study aims to carry out an integrative literature review to identify the main epidemiological aspects and oral implications of Paracoccidioidomycosis (PMC) infection. A search was carried out in the PubMed, LILACS and SciELO databases and, after applying the inclusion, exclusion, eligibility and thematic relevance criteria, 18 articles were selected for analysis in this study. Paracoccidioidomycosis (PCM) is endemic in Brazil and mainly affects middle-aged and elderly men and occurs in rural areas. It is a systemic disease where clinical manifestations are often added to oral lesions. Dentists play a key role in identifying these lesions, in the correct diagnosis and treatment of this disease. Making PCM notification compulsory in Brazil is essential
    corecore